[SPARK-19831][CORE] Reuse the existing cleanupThreadExecutor to clean up the directories of finished applications to avoid the block#17189
Conversation
…plication to avoid the block
|
I think you need to call |
| logInfo(s"Cleaning up local directories for application $id") | ||
| dirList.foreach { dir => | ||
| Utils.deleteRecursively(new File(dir)) | ||
| cleanupApplicationThreadExecutor.submit(new Runnable { |
There was a problem hiding this comment.
I prefer to just reuse the existing cleanupThreadExecutor like this:
appDirectories.remove(id).foreach { dirList =>
concurrent.Future {
logInfo(s"Cleaning up local directories for application $id")
dirList.foreach { dir =>
Utils.deleteRecursively(new File(dir))
}
}(cleanupThreadExecutor).onFailure {
case e: Throwable =>
logError(s"Clean up app dir $dirList failed: ${e.getMessage}", e)
}(cleanupThreadExecutor)
}
shuffleService.applicationRemoved(id)There was a problem hiding this comment.
I agree with you, but I am worried that cleaning up the workDir and application all cost mush time.
There was a problem hiding this comment.
That won't become an issue. The worst case is there will be some pending tasks in the queue of cleanupThreadExecutor. Considering the number of applications is not huge, it won't be an issue.
|
ok to test |
|
Test build #74220 has finished for PR 17189 at commit
|
|
Test build #74266 has finished for PR 17189 at commit
|
|
|
||
| // A separated thread to clean up the workDir. Used to provide the implicit parameter of `Future` | ||
| // methods. | ||
| // A separated thread to clean up the workDir and the finished application. |
There was a problem hiding this comment.
nit: the directories of finished applications.
|
@hustfxj Looks pretty good. Could you update the PR title and description to reflect the latest changes? |
|
@zsxwing Of course, Thank you for your reminding |
|
Test build #74397 has finished for PR 17189 at commit
|
|
LGTM. Merging to master. Thanks! |
Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend ThreadSafeRpcEndpoint. If the heartbeat from a worker is blocked by the message ApplicationFinished, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again.
It had better reuse the existing cleanupThreadExecutor to clean up the directories of finished applications to avoid the block.